Object-relational impedance mismatch

The object-relational impedance mismatch is a set of conceptual and technical difficulties that are often encountered when a relational database management system (RDBMS) is being used by a program written in an object-oriented programming language or style; particularly when objects or class definitions are mapped in a straightforward way to database tables or relational schemata.

The term object-relational impedance mismatch is derived from the electrical engineering term impedance matching.

Contents

Mismatches

Object-oriented concepts

Encapsulation

Object-oriented programs are designed with methods that result in encapsulated objects whose representation is hidden. Mapping such private object representation to database tables makes such databases fragile according to OOP (object-oriented programming) philosophy, since there are significantly fewer constraints for design of encapsulated private representation of objects compared to a database's use of public data, which must be amenable to upgrade, inspection and queries.

RDBMS tend to use rule-based and role-based protection and security mechanisms instead of direct interface restrictions. It could be said that OOP encapsulation uses "additive" security and protection control mechanisms; while RDBMS tend to use "subtractive" mechanisms. Being part of the RDBMS automatically assigns a given data item a default set of relational and database operations. Any restrictions needed on these are often done by removing them incrementally as needed. For example, non-managers may be denied the ability to delete a record in the managers' tables. On the other hand, encapsulation assumes that a given object offers no access to the outside world until an interface is explicitly built that provides it. (However, most RDBMS offer "stored procedures", which share some characteristics with OOP-style encapsulation.)

Invariance

In particular, encapsulation incorporates the more general concept of an invariant, a concept which is used in object oriented (OO) modeling. Invariants are not easily represented in relational databases (although this may depend on how the term "invariant" is interpreted to the process being modeled or implemented).

Accessibility

In relational thinking, "private" versus "public" access is relative to need rather than being an absolute characteristic of the data's state, as in the OO model. The relational and OO models often have conflicts over relativity versus absolutism of classifications and characteristics.

Interface, class, inheritance and polymorphism

Access to objects in object-oriented programs is allegedly best performed via interfaces that together provide the only access to the internals of an object. The relational model, on the other hand, utilizes derived relation variables (views) to provide varying perspectives and constraints to ensure integrity. Similarly, essential OOP concepts for classes of objects, inheritance and polymorphism are not supported by relational database systems.

Mapping to relational concepts

A proper mapping between relational concepts and object-oriented concepts can be made if relational database tables are linked to associations found in object-oriented analysis.

Data type differences

A major mismatch between existing relational and OO languages is the type system differences. The relational model strictly prohibits by-reference attributes (or pointers), whereas OO languages embrace and expect by-reference behavior. Scalar types and their operator semantics are also very often subtly to vastly different between the models, causing problems in mapping.

For example, most SQL systems support string types with varying collations and constrained maximum lengths (open-ended text types tend to hinder performance), while most OO languages consider collation only as an argument to sort routines and strings are intrinsically sized to available memory. A more subtle, but related example is that SQL systems often ignore trailing white space in a string for the purposes of comparison, whereas OO string libraries do not. It is typically not possible to construct new data types as a matter of constraining the possible values of other primitive types in an OO language.

Structural and integrity differences

Another mismatch has to do with the differences in the structural and integrity aspects of the contrasted models. In OO languages, objects can be composed of other objects—often to a high degree—or specialize from a more general definition. This may make the mapping to relational schemas less straightforward. This is because relational data tends to be represented in a named set of global, unnested relation variables. Relations themselves, being sets of tuples all conforming to the same header do not have an ideal counterpart in OO languages. Constraints in OO languages are generally not declared as such, but are manifested as exception raising protection logic surrounding encapsulated internal data. The relational model, on the other hand, calls for declarative constraints on scalar types, attributes, relation variables, and the database as a whole.

Manipulative differences

The semantic differences are especially apparent in the manipulative aspects of the contrasted models, however. The relational model has an intrinsic, relatively small and well defined set of primitive operators for usage in the query and manipulation of data, whereas OO languages generally handle query and manipulation through custom-built or lower-level, case and physical access path specific imperative operations. Some OO languages do have support for declarative query sub-languages, but because OO languages typically deal with lists and perhaps hash-tables, the manipulative primitives are necessarily distinct from the set-based operations of the relational model.

Transactional differences

The concurrency and transaction aspects are significantly different also. In particular, relational database transactions, as the smallest unit of work performed by databases, are much larger than any operations performed by classes in OO languages. Transactions in relational databases are dynamically bounded sets of arbitrary data manipulations, whereas the granularity of transactions in OO languages is typically individual assignments of primitive typed fields. OO languages typically have no analogue of isolation or durability as well and atomicity and consistency are only ensured for said writes of primitive typed fields.

Solving impedance mismatch

Solving the impedance mismatch problem for object-oriented programs starts with recognition of the differences in the specific logic systems being employed, then either the minimization or compensation of the mismatch.

Minimization

There have been some attempts at building object-oriented database management systems (OODBMS) that would avoid the impedance mismatch problem. They have been less successful in practice than relational databases however, partly due to the limitations of OO principles as a basis for a data model[1]. There has been research performed in extending the database-like capabilities of OO languages through such notions as transactional memory.

One common solution to the impedance mismatch problem is to layer the domain and framework logic. In this scheme, the OO language is used to model certain relational aspects at runtime rather than attempt the more static mapping. Frameworks which employ this method will typically have an analogue for a tuple, usually as a "row" in a "dataset" component or as a generic "entity instance" class, as well as an analogue for a relation. Advantages of this approach may include:

Disadvantages may include:

Alternative architectures

The rise of XML databases and XML client structures has motivated other alternative architectures to get around the impedance mismatch challenges. These architectures use XML technology in the client (such as XForms) and native XML databases on the server that use the XQuery language for data selection. This allows a single data model and a single data selection language (XPath) to be used in the client, in the rules engines and on the persistence server[2].

Compensation

The mixing of levels of discourse within OO application code presents problems, but there are some common mechanisms used to compensate. The biggest challenge is to provide framework support, automation of data manipulation and presentation patterns, within the level of discourse in which the domain data is being modeled. To address this, reflection and/or code generation are utilized. Reflection allows code (classes) to be addressed as data and thus provide automation of the transport, presentation, integrity, etc. of the data. Generation addresses the problem through addressing the entity structures as data inputs for code generation tools or meta-programming languages, which produce the classes and supporting infrastructure en masse. Both of these schemes may still be subject to certain anomalies where these levels of discourse merge. For instance, generated entity classes will typically have properties which map to the domain (e. g. Name, Address) as well as properties which provide state management and other framework infrastructure (e. g. IsModified).

Contention

The following are some contentions that have been raised:

Some, however, would point out that this contention is moot due to the fact that: (1) RDBMSes were never intended to facilitate object modelling, and (2) SQL generally should only be seen as a "lossy" or "inefficient" interface language when one is trying to achieve a solution for which RDBMSes were not designed. SQL is very efficient at doing what it was designed to do, namely, to query, sort, filter, and store large sets of data. Some would additionally point out that the inclusion of OO language functionality in the back-end simply facilitates bad architectural practice, as it admits high-level application logic into the data tier, antithetical to the RDBMS.

Philosophical differences

Key philosophical differences between the OO and relational models can be summarized as follows:

As a result of the object-relational impedance mismatch, it is often argued by partisans on both sides of the debate that the other technology ought to be abandoned or reduced in scope[3]. Some database advocates view traditional "procedural" languages as more compatible with an RDBMS than many OO languages; or suggest that a less OO-style ought to be used. (In particular, it is argued that long-lived domain objects in application code ought not to exist; any such objects that do exist should be created when a query is made and disposed of when a transaction or task is complete). On the other hand, many OO advocates argue that more OO-friendly persistence mechanisms, such as OODBMS, ought to be developed and used, and that relational technology ought to be phased out. Of course, it should be pointed out that many (if not most) programmers and DBAs do not hold either of these viewpoints; and view the object-relational impedance mismatch as a mere fact of life that information technology has to deal with.

It is also argued that the O/R mapping is paying off in some situations, but is probably oversold: it has advantages besides drawbacks. Skeptics point out that it is worth to think carefully before using it, as it will add little value in some cases[4].

See also

References

  1. ^ C. J. Date, Relational Database Writings
  2. ^ Dan McCreary, XRX: Simple, Elegant, Disruptive on XML.com
  3. ^ Neward, Ted (2006-06-26). "The Vietnam of Computer Science". Interoperability Happens. http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx. Retrieved 2010-06-02. 
  4. ^ J2EE Design and Development by Rod Johnson, © 2002 Wrox Press, p. 256.

External links